Semantic Frame Annotation on the French MEDIA corpus
نویسندگان
چکیده
This paper introduces a knowledge representation formalism used for annotation of the French MEDIA dialogue corpus in terms of high level semantic structures. The semantic annotation, worked out according to the Berkeley FrameNet paradigm, is incremental and partially automated. We describe an automatic interpretation process for composing semantic structures from basic semantic constituents using patterns involving words and constituents. This process contains procedures which provide semantic compositions and generate frame hypotheses by inference. The MEDIA corpus is a French dialogue corpus recorded using a Wizard of Oz system simulating a telephone server for tourist information and hotel booking. It had been manually transcribed and annotated at the word and semantic constituent levels. These levels support the automatic interpretation process which provides a high level semantic frame annotation. The Frame based Knowledge Source we composed contains Frame definitions and composition rules. We finally provide some results obtained on the automatically-derived annotation.
منابع مشابه
A Bayesian approach to semantic composition for spoken language interpretation
This paper introduces a stochastic interpretation process for composing semantic structures. This process, dedicated to spoken language interpretation, allows to derive semantic frame structures directly from word and basic concept sequences representing the users’ utterances. First a two-step rule-based process has been used to provide a reference semantic frame annotation of the speech traini...
متن کاملA General Framework for the Annotation of Causality Based on FrameNet
We present here a general set of semantic frames to annotate causal expressions, with a rich lexicon in French and an annotated corpus of about 4000 instances of causal lexical items with their corresponding semantic frames. The aim of our project is to have both the largest possible coverage of causal phenomena in French, across all parts of speech, and have it linked to a general semantic fra...
متن کاملRapid FrameNet annotation of spoken conversation transcripts
This paper presents the semantic annotation process of a corpus of spoken conversation transcriptions recorded in the Paris transport authority call-centre. The semantic model used is a FrameNet model developed for the French language. The methodology proposed for the rapid annotation of this corpus is a semi-supervised process where syntactic dependency annotations are used in conjunction with...
متن کاملUsing MMIL for the High Level Semantic Annotation of the French MEDIA Dialogue Corpus
The MultiModal Interface Language formalism (MMIL) has been selected as the High Level Semantic (HLS) formalism for annotating the French MEDIA dialogue corpus. This corpus is composed of human-machine dialogues in the domain of hotel reservation and tourist information. Utterances in dialogues have been previously annotated with a concept-value flat semantics for studying and evaluating spoken...
متن کاملAn Incremental Architecture for the Semantic Annotation of Dialogue Corpora with High-Level Structures. A case of study for the MEDIA corpus
The semantic annotation of dialogue corpora permits building efficient language understanding applications for supporting enjoyable and effective human-machine interactions. Nevertheless, the annotation process could be costly, time-consuming and complicated, particularly the more expressive is the semantic formalism. In this work, we propose a bootstrapping architecture for the semantic annota...
متن کامل